Skip to content

fix: handle NUL byte in output_dataclip on step:complete#4910

Open
mvanhorn wants to merge 1 commit into
OpenFn:mainfrom
mvanhorn:fix/4893-output-dataclip-nul-byte
Open

fix: handle NUL byte in output_dataclip on step:complete#4910
mvanhorn wants to merge 1 commit into
OpenFn:mainfrom
mvanhorn:fix/4893-output-dataclip-nul-byte

Conversation

@mvanhorn

Copy link
Copy Markdown

Description

This PR fixes a crash on step:complete. When a worker reports an
output_dataclip whose decoded JSON contains a NUL byte (0x00),
Lightning.Runs.Handlers.CompleteStep raised an unhandled Postgrex.Error
(code 22P05, untranslatable_character) while inserting the Dataclip,
because PostgreSQL cannot store 0x00 in text/jsonb. The raise propagated
out of maybe_save_dataclip/2 and failed the whole reply, leaving the step/run
without a clean completion (Sentry LIGHTNING-13C).

The fix detects the unstorable NUL byte before the insert in the third
maybe_save_dataclip/2 clause and returns {:error, changeset} instead of
letting Repo.insert/1 raise. Detecting before the insert (rather than rescuing
the Postgrex.Error) keeps the surrounding Repo.transact/1 transaction from
being poisoned, and CompleteStep.call/2's existing with already propagates
the error tuple to the caller. Worker-side handling of the error response lives
in a separate repo and is out of scope.

Two details worth calling out:

  • The check runs on the body that is actually persisted. Dataclip.new/1
    strips the unpersisted top-level configuration key via
    remove_configuration/1, so the guard reads
    Ecto.Changeset.get_field(changeset, :body) after that stripping. A NUL byte
    that lives only under configuration never reaches PostgreSQL and is allowed
    through.
  • The returned error changeset is built without the dataclip body.
    LightningWeb.ChannelHelpers.reply_with/2 sends inspect(error) to Sentry,
    so a changeset carrying changes.body would leak the decoded job output into
    Sentry. The error changeset carries only the :body error message.

Closes #4893

Validation steps

  1. mix test test/lightning/runs_test.exs - three new tests in the
    describe "complete_step/2" block cover: a NUL byte at the top level returns
    {:error, %Ecto.Changeset{}} (no raise), a NUL byte nested in an array
    element is also caught, and a NUL byte present only under top-level
    configuration still completes with {:ok, _step} (since that key is not
    persisted).
  2. The existing complete_step/2 happy-path tests still pass unchanged (normal
    output dataclips insert and complete as before).

Additional notes for the reviewer

  1. The NUL detection (contains_null_byte?/1) walks decoded JSON maps, lists,
    and binaries, checking both keys and values, so it catches NUL bytes at any
    depth.
  2. This is the Lightning-side half discussed on the issue thread (reject rather
    than mutate user data). The worker not yet respecting an error response on
    step:complete is tracked separately.

AI Usage

Please disclose whether you've used AI anywhere in this PR (it's cool, we just
want to know!):

  • I have used Claude Code
  • I have used another model
  • I have not used AI

You can read more details in our
Responsible AI Policy

Pre-submission checklist

  • I have performed an AI review of my code (we recommend using /review
    with Claude Code)
  • I have implemented and tested all related authorization policies.
    (e.g., :owner, :admin, :editor, :viewer) - n/a, this is an
    internal data-persistence guard with no authorization surface
  • I have updated the changelog.
  • I have ticked a box in "AI usage" in this PR

Detect an unstorable NUL byte (0x00) in the decoded output_dataclip body
before inserting the Dataclip and return an {:error, changeset} instead of
letting Repo.insert raise an unhandled Postgrex.Error (22P05). PostgreSQL
cannot store 0x00 in text/jsonb, so the prior code crashed the whole
step:complete reply. Detecting before insert avoids poisoning the
surrounding Repo.transact transaction.

Closes OpenFn#4893
@github-project-automation github-project-automation Bot moved this to New Issues in Core Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

step:complete crashes with Postgrex 22P05 when output_dataclip body contains a NUL byte

1 participant